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Abstract. Recommender systems are popular in e-commerce as they 
suggest items of interest to users. Researchers have addressed the cold- 
start problem where either the user or the item is new. However, the 
situation with both new user and new item has seldom been considered. 
In this paper, we propose a cold-start recommendation approach to this 
situation based on granular association rules. Specifically, we provide a 
means for describing users and items through information granules, a 
means for generating association rules between users and items, and a 
means for recommending items to users using these rules. Experiments 
are undertaken on a publicly available dataset MovieLens. Results indi- 
cate that rule sets perform similarly on the training and the testing sets, 
and the appropriate setting of granule is essential to the application of 
granular association rules. 

Keywords: Granular computing, granule, recommendation, association 
rule. 



1 Introduction 

Recommender systems suggest items of interest to users, therefore they serve as 
an essential part of e-commerce. To date, hundreds of methods have been pro- 
posed. Collaborative filtering methods jl)2j base their recommendations on the 
historical data of the current user and other users. Content-based filtering meth- 
ods [314] involve the attribute information of users and items, and the historical 
data of the current user. Hybrid methods |5l6j integrate these two methods and 
take advantages of both. Recently, ensemble methods |7l8j are proposed to ag- 
gregate predications of base algorithms. 

The cold-start problem |6l9j is difficult in recommender systems. Researchers 
have addressed the cold-start problem where either the user or the item is new. 
The new item problem [5] is more often addressed, and the new user problem is 
symmetric to it [3]. Naturally, content-based filtering methods |3l4j are appro- 
priate for these problems. However, the situation with both new user and new 
item has seldom been considered. Since the historical data of the current user 
and item are unknown, the problem for this situation is more challenging. 
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In this paper, we propose a cold-start recommendation approach for the new 
situation based on granular association rules. This approach is also applicable 
to existing situations. First, we provide a means for describing users and items 
through information granules. Examples of information granules include "male 
students," "thriller movies," and "adventure movies released in 1990s" . Then we 
provide a means for generating association rules between users and items. An 
example granular association rule might be "60% male students rate 40% drama 
movies released in 1990s; 20% users are male students and 5% movies are drama 
released in 1990s." Here 60%, 40%, 20%, and 5% are the source coverage, the 
target coverage, the source confidence, and the target confidence, respectively. 
To obtain strong and meaningful rules, we need to specify thresholds for all 
four measures. Finally, we provide a means for recommending items to users 
using these rules. The quality of the recommender is evaluated mainly by the 
recommendation accuracy. 

There are already some works on granular association rules. The concept is 
defined in [lOlllj with a number of algorithms computing all granular association 
rules meeting thresholds of four measures. A more efficient algorithm which takes 
advantage of rough set theory is presented in P^. Two discretization approaches 
are studied in [13 for numeric data. Multi-value data are then addressed in [11] 
to obtain positive rules and discard negative rules such as "male students rate 
movies that are not comedy." This is because negative rules are uninteresting in 
such applications and they overwhelm positive ones. 

The main contribution of this paper will be the validation of granular associ- 
ation rules. In other words, we train and test granular association rules to study 
their performance. This work is an important step toward the application of gran- 
ular association rules |10lllj as well as granular computing |15I16I17I18I19I20| . 
Experiments are undertaken on the MovieLens dataset [21 1 using our open source 
software Grale |22] . Results indicate that rule sets perform similarly on the train- 
ing and the testing sets. More importantly, the appropriate setting of granule is 
essential to the application of granular association rules. 

2 Preliminaries 

In this section, we present some preliminary knowledge, especially granular as- 
sociation rules. 



2.1 Many-to-many entity-relationship systems 

First we revisit the definitions of information systems, binary relations and many- 
to-many entity relationship systems [10 . 

Definition 1.5'= [U, A) is an information system, where U = {xi, X2, . . . , a;„} 
is the set oj all objects, A = {ai, a2, . . . , is the set of all attributes, and aj{xi) 
is the value of Xi on attribute Oj for i £ [l..n\ and j € [l..m]. 



Two information systems are listed in Tables 1(a) and |l(b) respectively. In 
Table |l(b) 1 indicates true, while indicates false. 
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Table 1. A many-to-many entity-relationship system 



(a) User 



User-id 


Age 




Gender 


Occupation 


1 


[18, 24] 




M 


technician 


2 


[50, 55] 




F 


other 


3 


[18,24] 




M 


writer 


943 


[18, 24] 




M 


student 


(b) Movie 


Movie-id 


Release-decade 


Action 


Adventure 


Animation . . . Western 


1 


1990s 








... 


2 


1980s 





1 


1 ... 


3 


1990s 








... 


1,682 


1960s 








... 


(c) Rates 


User-id\ Movie-id 1 


2 


3 4 


5 ... 1,682 


1 





1 


1 


... 


2 


1 





1 


... 1 


3 











1 ... 1 


943 








1 1 


... 1 



Definition 2. Let U = {xi,X2, ■ ■ ■ ,x„} and V = {yi,y2, ■ ■ ■ , J/fc} be two sets of 
objects. Any R Q U x V is a binary relation from U to V . 



An example of binary relation is given by Table l(c)[ where U is the set of 



users as indicated by Table 1(a) and V is the set of movies as indicated by 
Table 1(b) A binary relation can be viewed as an information system. However, 
in order to save storage space, it is more often stored in the database as a table 
with two foreign keys. 

Definition 3. A many-to-many entity-relationship system (MMER) is a 5-tuple 
ES = (U, A, V, B, R), where (U, A) and (V, B) are two information systems, and 
R Q U X V is a binary relation from U to V . 

An example of MMER is given by Table [ij 
2.2 Information granules 

Now we employ information granules [11119] to describe users and items. In an 
information system, any A' Q A induces an equivalence relation |23l24j 



Ea' = {{x,y) eUx C/|Va G A',a(x) = a{y)}, 



(1) 
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and partitions U into a number of disjoint subsets called blocks or granules. The 
granule containing x € U is 

Ea' (x) - {ye U\ya G A\ a{y) = a{x)). (2) 

The following definition was employed by Yao and Deng [ 19j . 

Definition 4. A granule is a triple 

G^{g,t{g),e{g)), (3) 

where g is the name assigned to the granule, i{g) is a representation of the 
granule, and e{g) is a set of objects that are instances of the granule. 

According to Equation ([2]), {A',x) determines a granule in an information 
system. Hence g — g(^A! ^ x) is a natural name to the granule, ^(g) can be formal- 
ized as the conjunction of respective attribute- value pairs, i.e., 

i[g{Al,x))= [\ {a:aix)). (4) 

aeA' 

e{g) is given by 

e{g{A',x))^EA'{x). (5) 

The support of the granule is the size of e{g) divided by the size of the 
universe, namely, 

supp{g{A' ,x)) = supp{ /\ {a : a{x))) = supp{EA'{x)) = ^^jr^- (6) 

het X & U and A" <Z A' <Z A, we have 

e{g{A,x))Qe{g{A",x)). (7) 

Consequently, we say that g{A\x) is finer than g{A" ,x), or g{A" ,x) is coarser 
than g{A' , x). 



2.3 Granular association rules 

Now we discuss the means for connecting users and items. A granular association 
rule [lOlllj is an implication of the form 

{GR): /\{a:a{x))^ /\{h:b{y)), (8) 

a£A' beB' 

where A' C A and B' C B. 

Before defining evaluation measures, let us look at an example of granular 
association rule "60% male students rate 40% drama movies released in 1990s; 
20% users are male students and 5% movies are drama released in 1990s." Here 
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60%, 40%, 20%, and 5% are the source coverage, the target coverage, the source 
confidence, and the target confidence, respectively. 

According to Equations Q, ([s]) and ([s]), the set of objects meeting the left- 
hand side of the granular association rule is 

LH{GR) = Ea'{x), (9) 

while the set of objects meeting the right-hand side of the granular association 
rule is 

RH{GR)^EB'{y). (10) 
The source coverage of GR is 

scov{GR) = \LH{GR)\/\U\] (11) 

while the target coverage of GR is 

tcov{GR) = \RH{GR)\/\V\. (12) 

There is a tradeoff between the source confidence and the target confidence. 
Consequently, neither of them can be obtained directly from the rule. To com- 
pute any one of them, we need to specify the threshold of the other. In our 
computation, we adopt the following approach. Let tc be the target confidence 
threshold. The source confidence of the rule is 

«e) = . ,13) 

Naturally, the source coverage and the target coverage indicate the generality 
of a rule, while the source confidence and the target confidence indicate the 
strength of a rule. 



3 Training and testing granular association rules 

The set of all possible granular association rules might be very large. Therefore 
we would like to train a relatively small rule set. The key issue is: which rules 
should be included in this set for recommendation? One approach is to specify 



thresholds of four measures mentioned in Section 2.3 All rules satisfying these 
thresholds are generated to build the recommender. 

Problem 1. The granular association rule mining problem. 

Input: An ES = {U, A,V, B, R), a minimal source coverage threshold ms, 
a minimal target coverage threshold mt, a minimal source confidence threshold 
sc, and a minimal target confidence threshold tc. 

Output: A rule set where each rule satisfying scov{GR) > ms, tcov{GR) > 
mt, sconf{GR) > sc, and tconf(GR) > tc. 
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Algorithm 1 A sandwich algorithm for granular association rule mining 
Input: ES = {U, A, V, B, R), ms, mt, sc, tc. 
Output: A rule set. 
Method: sandwich. 

1: SG{ras) = {{A' ,x) G 2^ x Ul"^-^^^ > ms}; 

2: TG{mt) = {{B\y) G 2^ x V|^f^ > mt}; 

3: for each g G SG{ms) do 

4: for each g' G TG{mt) do 

5: GR^iiig)^i{g')); 

6: if scon f{GR, tc) > sc then 

7: add GR to the rule set; 

8: end if 

9: end for 

10: end for 



Algorithm [T] is a straightforward approach to obtaining this rule set. It is 
quite similar to the algorithm presented in The difference lies in that the 
new algorithm stores the rule set in the memory instead of outputting it directly. 
Another algorithm presented in [12| can be also revised for this problem to im- 
prove efficiency. Since the focus of the paper is the effectiveness of the algorithm, 
we will not discuss it here. 

For each user, we recommend items of interest using the rule set. All rules 
that match the user are fired for recommendation. Hence some users may have 
many recommendations, and some may have very few. The performance of the 
recommender is evaluated mainly by the recommendation accuracy. Formally, 
let the number of recommended items be M, and the number of success recom- 
mendations be N, the accuracy is N/M. 

We will compare five training and testing scenarios. 

1. Random recommendation. There is no training stage. An item is randomly 
recommended to a user. This is a baseline approach since a recommender 
which is not significantly better than a random one is simply useless. 

2. Test the training set. It is interesting to test the rule set on the training set. 
Since only attribute values can be employed, the performance of the rule set 
may not be high. 

3. Divide the user set into the training and testing set. This scenario corre- 
sponds to the new user cold-start problem. 

4. Divide the item set into the training and testing set. This scenario corre- 
sponds to the new item cold-start problem. 

5. Divide both the user and the item sets. In this scenario, both users and items 
are new. Hence the problem is very challenging. 

4 Experiments 

In this section, we try to answer the following questions through experimentation. 
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1. How does the recommender perform for the new user, new item, and both 
new cold-start problems? 

2. How does the number of recommendations change for different thresholds? 

3. How does the performance of the recommender vary for the training/testing 
sets? 

4. How does the performance of the recommender change for different threshold 
settings? 

4.1 Dataset 

We tested granular association rules on the MovieLens [21] which is widely used 
in recommender systems (see, e.g., [6|9|). The database schema is as follows. 

• User (userlD, age, gender, occupation) 

• Movie ( movielD . release-year, genre) 

• Rates (userlD, movielD) 

We use the version with 943 users and 1,682 movies. The data are preprocessed to 
cope with Definition [3] as follows. The original Rate relation contains the rating 
of movies with 5 scales, while we only consider whether or not a user has rated a 
movie. The user age is discretized to 5 intervals as follows: [7,22), [22,27), [27,31), 
[31,39), [39,48) and [48,73]. The release year is discretized to 5 intervals as follows: 
[1922,1980), [1980,1993), [1993,1994), [1994,1995), [1995, 1996), [1996,1997) and 
[1997,1998]. The genre is a multi-valued attribute. Therefore we scale it to 18 
boolean attributes and deal with it using the approach proposed in |14j . 

4.2 Results 

We undertake three sets of experiments to answer the questions raised at the 
beginning of the section one by one. The settings are as follows: the training 
set percentage is 60%, and sc = tc = 0.3. Each experiment is repeated 20 times 
with different sampling of training and testing sets, and the average accuracy is 
computed. 

Fig. [1] shows accuracy of the recommender on the new user, new item, and 
both new scenarios. The random recommender, which has an accuracy close to 
0.062, is also illustrated for comparison. The result indicates that the new item 
problem is easier and respective recommendations are more meaningful. The 
both new scenario is the hardest since the least information is available. 

Fig. [2] indicates that the number of recommendations decrease dramatically 
with the increase of ms and mt. 

Fig. [3] indicates with the change of ms and mi, the recommender performs 
different. The following observations are especially interesting from this figure. 

1. When ms and mt are appropriately set (0.04), the recommender has the 
maximal accuracy. With the increase or decrease of these thresholds, the 
performance of the recommender decreases rapidly. This is the most impor- 
tant observation from the experiment. 
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0.2 




ms{mt) 

Fig. 2. Comparison of the number of recommendations 

2. The performance of the recommender does not vary much on the training 
and the testing sets. This phenomenon indicates that the recommender is 
stable. 

3. The recommender achieves the best performance for both the training and 
the testing set with the same settings. This is somehow surprising since the 
recommender does not incur over-fitting. 

5 Conclusions 

In this paper, granular association rules are trained to build cold-start recom- 
mender. We specify different granules in terms of source coverage and target 
coverage to obtain different recommenders. Experimental results indicate that 
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Training 
Testing 



0.11' ■ ' ' ' ' ' 

0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 

ms(mt) 

Fig. 3. Comparison of different granules 



the appropriate selection of the granule are essential to the performance of the 
recommender. In the future we will design algorithms for effective granule selec- 
tion. 
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